Generating Patterns for Extracting Chinese-Korean Named Entity Translations from theWeb

نویسندگان

  • Chih-Hao Yeh
  • Wei-Chi Tsai
  • Yu-Chun Wang
  • Richard Tzong-Han Tsai
چکیده

One of the main difficulties in Chinese-Korean cross-language information retrieval is to translate named entities (NE) in queries. Unlike common words, most NE’s are not found in bilingual dictionaries. This paper presents a pattern-based method of finding NE translations online. The most important feature of our system is that patterns are generated and weighed automatically, saving considerable human effort. Our experimental data consists of 160 Chinese-Korean NE pairs selected fromWikipedia in five domains. Our approach can achieve a very high MAP of 0.84, which demonstrates our system’s practicability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Patterns from the Web to Translate Named Entities for Cross Language Information Retrieval

Named entity (NE) translation plays an important role in many applications. In this paper, we focus on translating NEs from Korean to Chinese to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integr...

متن کامل

A Voting Mechanism for Named Entity Translation in English – Chinese Question Answering

In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation, online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment r...

متن کامل

Chinese Syntactic Reordering through Contrastive Analysis of Predicate-predicate Patterns in Chinese-to-Korean SMT

We propose a Chinese dependency tree reordering method for Chinese-to-Korean SMT systems through analyzing systematic differences between the Chinese and Korean languages. Translating predicate-predicate patterns in Chinese into Korean raises various issues such as long-distance reordering. This paper concentrates on syntactic reordering of predicate-predicate patterns in Chinese dependency tre...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names

Unknown term translation is important to CLIR and MT systems, but it is still an unsolved problem. Recently, a few researchers have proposed several effective search-result-based term translation extraction methods which explore search results to discover translations of frequent unknown terms from Web search results. However, many infrequent unknown terms, such as abbreviations and proper name...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008